# Low-resource deployment
Diffucoder 7B Cpgrpo 8bit
DiffuCoder-7B-cpGRPO-8bit is a code generation model converted to MLX format, based on apple/DiffuCoder-7B-cpGRPO, and is specifically designed to provide developers with an efficient code generation tool.
Large Language Model Other
D
mlx-community
272
2
Unireason Qwen3 14B RL GGUF
Apache-2.0
A static quantization version of UniReason-Qwen3-14B-RL, suitable for text generation and mathematical reasoning research scenarios.
Large Language Model
Transformers English

U
mradermacher
272
1
Gemma 3n E2B GGUF
A static quantized version of the Google Gemma-3n-E2B model, offering various quantization types to balance model size and performance.
Large Language Model
Transformers English

G
mradermacher
207
0
Delta Vector Austral 70B Winton GGUF
Apache-2.0
This is a quantized version of the Austral-70B-Winton model by Delta-Vector. Through quantization technology, it reduces the storage and computational resource requirements of the model while maintaining good performance, making it suitable for scenarios with limited resources.
Large Language Model English
D
bartowski
791
1
Gama 12b I1 GGUF
A quantized version of Gama-12B, providing files of various quantization types, suitable for text generation tasks and supporting English and Portuguese.
Large Language Model
Transformers Supports Multiple Languages

G
mradermacher
559
1
Gama 12b GGUF
Gama-12B is a large language model supporting multiple languages, offering various quantized versions to meet different performance and precision requirements.
Large Language Model
Transformers Supports Multiple Languages

G
mradermacher
185
1
Longwriter Zero 32B I1 GGUF
Apache-2.0
The LongWriter-Zero-32B quantized model is based on the THU-KEG/LongWriter-Zero-32B base model, supports both Chinese and English, and is suitable for long context scenarios such as reinforcement learning and writing.
Large Language Model
Transformers Supports Multiple Languages

L
mradermacher
135
1
Skywork Skywork SWE 32B GGUF
Apache-2.0
Skywork-SWE-32B is a large language model with 32B parameters. It is quantized by Llamacpp imatrix and can run efficiently in resource-constrained environments.
Large Language Model
S
bartowski
921
2
Nvidia AceReason Nemotron 1.1 7B GGUF
Other
This is a quantized version of the NVIDIA AceReason - Nemotron - 1.1 - 7B model, which optimizes the model's running efficiency on different hardware while maintaining certain performance and quality.
Large Language Model Supports Multiple Languages
N
bartowski
1,303
1
Openbuddy OpenBuddy R1 0528 Distill Qwen3 32B Preview0 QAT GGUF
Apache-2.0
This is the quantized version of OpenBuddy-R1-0528-Distill-Qwen3-32B-Preview0-QAT. With quantization technology, the model can run more efficiently under different hardware conditions.
Large Language Model Supports Multiple Languages
O
bartowski
720
1
Qwen3 Embedding 0.6B Onnx Uint8
Apache-2.0
This is a quantized model based on ONNX, which is the uint8 quantized version of Qwen/Qwen3-Embedding-0.6B. It reduces the model size while maintaining retrieval performance.
Text Embedding
Q
electroglyph
112
8
Wan2.1 T2V 14B FusionX VACE GGUF
Apache-2.0
This is a text-to-video quantization model that undergoes quantization conversion based on a specific base model and supports various video generation tasks.
Text-to-Video English
W
QuantStack
461
3
Wan2.1 T2V 14B FusionX GGUF
Apache-2.0
This is a quantized text-to-video model that converts the base model to the GGUF format and can be used in ComfyUI, providing more options for text-to-video generation.
Text-to-Video English
W
QuantStack
563
2
Deepseek R1 0528 Qwen3 8B 6bit
MIT
A 6-bit quantized version converted from the DeepSeek-R1-0528-Qwen3-8B model, suitable for text generation tasks in the MLX framework.
Large Language Model
D
mlx-community
582
1
Blitzar Coder 4B F.1 GGUF
Apache-2.0
Blitzar-Coder-4B-F.1 is an efficient multilingual coding model fine-tuned based on Qwen3-4B, supporting more than 10 programming languages and having excellent code generation, debugging, and reasoning capabilities.
Large Language Model
Transformers

B
prithivMLmods
267
1
Echelon AI Med Qwen2 7B GGUF
This project provides the GGUF quantized file for the Echelon-AI/Med-Qwen2-7B model, supported by Featherless AI, aiming to enhance model performance and reduce operating costs.
Large Language Model
E
featherless-ai-quants
183
1
Gemma 3n E4B It
Gemma 3n is a lightweight and state-of-the-art open-source multimodal model family launched by Google. It is built on the same research and technology as the Gemini model and supports text, audio, and visual inputs.
Image-to-Text
Transformers

G
google
1,690
81
Bielik 11B V2.6 Instruct GGUF
Apache-2.0
Bielik-11B-v2.6-Instruct is a large Polish language model developed by SpeakLeash and ACK Cyfronet AGH, fine-tuned based on Bielik-11B-v2, suitable for instruction following tasks.
Large Language Model
Transformers

B
speakleash
206
5
Phi 3.5 Mini Instruct
MIT
Phi-3.5-mini-instruct is a lightweight and advanced open-source model built on the dataset used by Phi-3, focusing on high-quality, inference-rich data. It supports a 128K token context length and has powerful multilingual and long-context processing capabilities.
Large Language Model
Transformers Other

P
Lexius
129
1
Deepseek R1 0528 GGUF
MIT
A quantized model based on DeepSeek-R1-0528, focusing on text generation tasks and providing a more efficient way of use.
Large Language Model
D
lmstudio-community
1,426
5
Infly Inf O1 Pi0 GGUF
A quantized version based on the infly/inf-o1-pi0 model, supporting multilingual text generation tasks, optimized with llama.cpp's imatrix quantization.
Large Language Model Supports Multiple Languages
I
bartowski
301
1
Medgemma 4b It GGUF
Other
medgemma-4b-it is a multimodal model focused on the medical field, capable of processing image and text inputs, and suitable for multiple medical scenarios such as radiology and clinical reasoning.
Text-to-Image
Transformers

M
second-state
564
1
Devstral Small 2505 4bit DWQ
Apache-2.0
This is a 4-bit quantized language model in MLX format, suitable for text generation tasks.
Large Language Model Supports Multiple Languages
D
mlx-community
238
3
Facebook KernelLLM GGUF
Other
KernelLLM is a large language model developed by Facebook. This version is quantized using the llama.cpp tool with imatrix, offering multiple quantization options to suit different hardware requirements.
Large Language Model
F
bartowski
5,151
2
Verireason Qwen2.5 1.5B Grpo Small GGUF
This is the statically quantized version of the Nellyw888/VeriReason-Qwen2.5-1.5B-grpo-small model, focusing on Verilog code generation and reasoning tasks.
Large Language Model English
V
mradermacher
48
1
A M Team AM Thinking V1 GGUF
Apache-2.0
Llamacpp imatrix quantized version based on a-m-team/AM-Thinking-v1 model, supporting multiple quantization types, suitable for text generation tasks.
Large Language Model
A
bartowski
671
1
Qwen3 0.6B Llamafile
Apache-2.0
Qwen3 is the latest generation of large language models in the Qwen series, offering a dense model with 0.6B parameters, achieving breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model
Q
Mozilla
250
1
Thedrummer Rivermind Lux 12B V1 GGUF
This is a 12B-parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantized versions to accommodate different hardware requirements.
Large Language Model
T
bartowski
1,353
1
Gryphe Pantheon Proto RP 1.8 30B A3B GGUF
Apache-2.0
This is a quantized version based on the Gryphe/Pantheon-Proto-RP-1.8-30B-A3B model, using llama.cpp for quantization, suitable for role-playing and text generation tasks.
Large Language Model English
G
bartowski
2,972
6
Qwen3 30B A3B 4bit DWQ 05082025
Apache-2.0
This is a 4-bit quantized model converted from Qwen/Qwen3-30B-A3B to MLX format, suitable for text generation tasks.
Large Language Model
Q
mlx-community
240
5
Bielik 1.5B V3.0 Instruct GGUF
Apache-2.0
This is a 1.5B parameter instruction fine-tuned model for Polish, developed based on the SpeakLeash Bielik series, suitable for text generation tasks.
Large Language Model Other
B
speakleash
341
3
Microsoft Phi 4 Reasoning Plus GGUF
MIT
The quantized version of Microsoft Phi-4-reasoning-plus, suitable for efficient text generation tasks on devices with limited resources.
Large Language Model Supports Multiple Languages
M
bartowski
1,516
10
Muyan TTS Q8 0 GGUF
Muyan-TTS is a text-to-speech (TTS) model, converted to GGUF format for use with llama.cpp.
Speech Synthesis
M
NikolayKozloff
80
2
Mlabonne Qwen3 14B Abliterated GGUF
This is the quantized version of the Qwen3-14B-abliterated model, quantized using llama.cpp's imatrix option, suitable for text generation tasks.
Large Language Model
M
bartowski
18.67k
16
Qwen Qwen3 0.6B GGUF
Apache-2.0
This repository contains GGUF format model files for Qwen/Qwen3-0.6B, quantized by TensorBlock's machines and compatible with llama.cpp.
Large Language Model
Q
tensorblock
905
3
Llamaestra 3.2 1B Translation GGUF
A 1B-parameter language model specializing in English and Italian translation, offering multiple quantized versions in GGUF format.
Machine Translation Supports Multiple Languages
L
tensorblock
5,028
1
Qwen2.5 7B Instruct GGUF Llamafile
Apache-2.0
Qwen2.5 is the latest series of the Tongyi Qianwen large model, including base models and instruction-tuned models with parameter scales ranging from 0.5B to 72B, showing significant improvements in areas such as code, mathematics, instruction following, and long text generation.
Large Language Model English
Q
Bojun-Feng
441
2
Qwen2 96M
Apache-2.0
Qwen2-96M is a miniature language model based on the Qwen2 architecture, containing 96 million parameters and supporting a context length of 8192 tokens, suitable for English text generation tasks.
Large Language Model English
Q
Felladrin
76
2
Smolvlm2 2.2B Instruct I1 GGUF
Apache-2.0
SmolVLM2-2.2B-Instruct is a vision-language model with a parameter scale of 2.2B, focusing on video text-to-text tasks and supporting English.
English
S
mradermacher
285
0
Ritrieve Zh V1 GGUF
MIT
This project provides a static quantized version of the richinfoai/ritrieve_zh_v1 model. Through quantization, it reduces storage space and computational resource requirements while maintaining certain performance.
Large Language Model
Transformers Chinese

R
mradermacher
212
1
- 1
- 2
- 3
- 4
- 5
Featured Recommended AI Models